Towards a Better Detection of Horizontally Transferred Genes by Combining Unusual Properties Effectively

نویسندگان

  • Dapeng Xiong
  • Fen Xiao
  • Li Liu
  • Kai Hu
  • Yanping Tan
  • Shunmin He
  • Xieping Gao
چکیده

BACKGROUND Horizontal gene transfer (HGT) is one of the major mechanisms contributing to microbial genome diversification. A number of computational methods for finding horizontally transferred genes have been proposed in the past decades; however none of them has provided a reliable detector yet. In existing parametric approaches, only one single compositional property can participate in the detection process, or the results obtained through each single property are just simply combined. It's known that different properties may mean different information, so the single property can't sufficiently contain the information encoded by gene sequences. In addition, the class imbalance problem in the datasets, which also results in great errors for the gene detection, hasn't been considered by the published methods. Here we developed an effective classifier system (Hgtident) that used support vector machine (SVM) by combining unusual properties effectively for HGT detection. RESULTS Our approach Hgtident includes the introduction of more representative datasets, optimization of SVM model, feature selection, handling of imbalance problem in the datasets and extensive performance evaluation via systematic cross-validation methods. Through feature selection, we found that JS-DN and JS-CB have higher discriminating power for HGT detection, while GC1-GC3 and k-mer (k = 1, 2, …, 7) make the least contribution. Extensive experiments indicated the new classifier could reduce Mean error dramatically, and also improve Recall by a certain level. For the testing genomes, compared with the existing popular multiple-threshold approach, on average, our Recall and Mean error was respectively improved by 2.81% and reduced by 26.32%, which means that numerous false positives were identified correctly. CONCLUSIONS Hgtident introduced here is an effective approach for better detecting HGT. Combining multiple features of HGT is also essential for a wider range of HGT events detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of horizontally transferred genes into regulatory interaction networks takes many million years.

Adaptation of bacteria to new or changing environments is often associated with the uptake of foreign genes through horizontal gene transfer. However, it has remained unclear how (and how fast) new genes are integrated into their host's cellular networks. Combining the regulatory and protein interaction networks of Escherichia coli with comparative genomics tools, we provide the first systemati...

متن کامل

Towards more robust methods of alien gene detection

Because the properties of horizontally-transferred genes will reflect the mutational proclivities of their donor genomes, they often show atypical compositional properties relative to native genes. Parametric methods use these discrepancies to identify bacterial genes recently acquired by horizontal transfer. However, compositional patterns of native genes vary stochastically, leaving no clear ...

متن کامل

Horizontal Gene Transfer : Effect and Affect on Computational

Through the efforts of the human genome project, genes horizontally and laterally gene transferred have been implicated as the source of bacterial protein homologs in the human genome. Additionally, similar protein homologs have been identified in archeal and bacterial species. Controversy continually surrounds the estimations of numbers of horizontally transferred genes, the time and distance ...

متن کامل

Towards a Simpler Photoautotrophic Cell - Conserved and Variable Genes in Synechococcus Elongatus

Simpler biological systems should be easier to understand and engineer. One way to achieve biological simplicity is through genome minimization. Here we have looked for genomic islands in the fresh water cyanobacterium Synechococcus elongatus PCC 7942 that could be used as targets for deletion for genome minimization. By using a combination of methods we have identified 184 genes that have been...

متن کامل

A computational tool for the genomic identification of regions of unusual compositional properties and its utilization in the detection of horizontally transferred sequences.

Similarity Plot (S-plot) is a Windows-based application for large-scale comparisons and 2-dimensional visualization of compositional similarities between genomic sequences. This application combines 2 approaches widely used in genomics: window analysis of statistical characteristics along genomes and dot-plot visual representation. S-plot is effective in identifying highly similar regions betwe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012